Upgrade FAQ finance to Milvus 2.1 #3267

w5688414 · 2022-09-14T08:17:02Z

PR types

New features

PR changes

Docs

Description

智能问答是获取信息和知识的更直接、更高效的方式之一，传统的信息检索方法智能找到相关的文档，而智能问答能够直接找到精准的答案，极大的节省了人们查询信息的时间。问答按照技术分为基于阅读理解的问答和检索式的问答，阅读理解的问答是在正文中找到对应的答案片段，检索式问答则是匹配高频的问题，然后把答案返回给用户。本项目属于检索式的问答，问答的领域用途很广，比如搜索引擎，小度音响等智能硬件，政府，金融，银行，电信，电商领域的智能客服，聊天机器人等。

本方案是场景的定制化的方案，用户可以使用自己的数据训练一个特定场景的方案。另外，想快速体验FAQ智能问答系统请参考Pipelines的实现FAQ智能问答
本项目的详细教程请参考（包括数据和代码实现）aistudio教程

tianxin1860

LGTM

* fix multi-layer-inherit * update bert model unittest * update requirements.txt * update ernie modeling test * update roberta unittest * update roformer modeling testing * complete ernie label loss * complete ernie/roberta/roformer unittest * update label/loss * update refactor code * remove unrelated requirements * add license * Update setup.py and README Examples (#3208) * Move token_num fetch out of train cycle (#3089) * Add finance course (#3207) * add finance course group code Co-authored-by: tianxin <[email protected]> * Update README_cn.md (#3212) add v2.4 features description. * Update README.md (#3209) Improve and fix the text content of case 1. Co-authored-by: tianxin <[email protected]> * [Recompute] Update recompute for hybrid parallel interface. (#3211) Co-authored-by: Zhong Hui <[email protected]> * Update README_cn.md * [ModelingOutput]update roformer unittest (#3159) * add roformer unittest * add roformer unittest * update test_modeling * use relative import * reduce model config to accelerate testing * remove input_embedding from pretrained model * revert slow tag * update local branch * update get_vocab method * update get_vocab method * update test_chinese method * change absolute import * update unittest * update chinese test case * add roformer more output testing Co-authored-by: Guo Sheng <[email protected]> Co-authored-by: liu zhengxi <[email protected]> * Update README_cn.md * Fix windows dtype bug of neural search (#3182) * Fix windows dtype bug of neural search * Fix windows dtype bug of neural search Co-authored-by: 吴高升 <[email protected]> * Update README_cn.md * Update README_cn.md * Update README_cn.md * [ModelingOutput]add more output for skep model (#3146) * update return_dict/label in skep model * complete skep add-more-output * refactor simple code Co-authored-by: Zhong Hui <[email protected]> Co-authored-by: Guo Sheng <[email protected]> Co-authored-by: liu zhengxi <[email protected]> * remove model_config_file and resource_files_names * Update README_cn.md (#3219) * Remove boost library. (#3215) * Remove boost library. * add conditional include for gtest * Add test, demo exclude * Update bos url for UIE (#3222) * Update bos url * Update README.md * Update README.md * 源码安装htbuilder,避免windows安装失败 (#3221) Co-authored-by: 吴高升 <[email protected]> * not default to gpu (#3218) * Update codegen params and doc (#3228) * update decoding * update doc * update three models * [Unittest]add roformerv2 unittest (#2994) * add roformerv2 unittest * update roformer-v2 testing * update config to accelerate testing * remove comment Co-authored-by: Guo Sheng <[email protected]> * Optimize text classification deploy (#3217) * optimize_deploy * optimize_deploy * update_readme * fix data distill for UIE (#3231) * fix data distill * update * add evaluate_teacher * [Pre-Training] ERNIE-CW pre-training tasks docs. (#3111) * add ernie-large config * update * update clue finetune. * unused delete. * update * support no nsp for enrie. * fix evaluation * fix amp o2 save_dtype bugs. * extand ernie. * fix ernie pretrain with ## vocab. * extend vocab * support custom tokenizer. * add some comments. * fix bugs. * add comments. * fix bug. * fix run_pretrain_static logging. * fix all gather. * fix a100 * fix * fix bugs * fix save * tmp commit for pre-process. * Update README.md * Update README.md * add amp o1 support * ernie cw readme. * fix * throw error when dataset is invalid. * update document. * refine readme. * fix * refactor * refator2 * Add pre-training introduction. * update image width. * refine doc * fit table width. * fix c++ style * fix table * refine docs * refine model_zoo/ernie-1.0/README.md * readfine readme. * fix link * fix bug * fix documents. * add weight. * fix config * Update README.md & Add more data into csv& change UI (#3237) * fix bug of label dimension smaller than 1 (#3238) * update output dirname of compression api (#3252) * [ModelingOutput] add tinybert/Electra/XLNet/ALBERT/ERNIE-M more output & loss (#3148) * complete tinybert more output & loss * complete tinybert/erniem output * complete xlnet unittest * complete the electra unittest * complete albert more modeling output * complete albert more modeling output * complete ernie-doc model more output * revert ernie-doc modeling * update more output * update model testing * convert paddle.is_tensor -> isinstance * update tinybert & electra models * Add unit tests for T5 (#3115) * analysis_module_bug_fix (#3246) * [CodeStyle] Add copyright for python file. (#3259) * Add copyright for python files. * [IssueTemplate] Add issue template (#3251) * update issue-template * remove old issue template * add id field to template * update github issue template * [BugFix]update vocab_size in init_config (#3260) * update vocab_size in init_config * make update_init_config more common Co-authored-by: Zhong Hui <[email protected]> * update t5 tests (#3266) * Update debug mode for relation prompt (#3263) * update debug mode for relation prompt * update * update * Update README.md and Rename dir to FAQ directory (#3272) * [DOC] Add ernie-1.0-base-zh-cw benchmark results. (#3248) * [DOC] Update highlights of README.md (#3278) * Update README.md * Update README.md * Add unit tests for UnifiedTransformer (#3177) * [Trainer] Support recompute for trainer. (#3261) * support recompute for trainer. * Upgrade FAQ finance to Milvus 2.1 (#3267) * Upgrade FAQ finance to Milvus 2.1 * Update text format for faq * Update feature_extract.sh * Fix ft substr bug (#3279) * optimize cmakelist * Add substr pos check * remove glog/logging.h (#3280) * Update ft version to 0.2.0 (#3285) * update docs wechat code (#3284) * update link typo (#3236) * add_dataset_link (#3286) * Add use_faster flag for uie of taskflow. (#3194) * Add use_faster flag for taskflow * Add empty line * Add doc of uie * remove faster_tokenizer tmp * merge * fix import error (#2853) * [TIPC]Support @to_static train for base-transformer (#3277) * [TIPC]Support @to_static train for base-transformer * Fix to_static args * Add ft compile doc and scripts (#3292) * Fix the mac compile * Add cpp, python lib building scripts * Remove cache in cpp lib * Add compile docs * fix ft build script (#3293) * Add Milvus2.1 Support and Update pipielines qa ui (#3283) * Add Milvus Support and Update pipielines qa ui * Remove unused comments * fix bug of relation example is empty (#3295) * Compression API Supports ERNIE-M and more Pretrained models (#3234) * update compression doc * update compression doc * support more models and update compression api * update inputspec info, avoid error * optimize train.py (#3300) * update ernie task tipc * update * optimize_sparse_strategy (#3311) * Add FAQ and missing json output files (#3298) * Add Docker compile Support for Pipelines (#3315) * Add Docker compile Support * change cuda to uppercase * Update README_en.md (#3320) * Update README_en.md * Update README_en.md * Update README_en.md * Update README_en.md * Update README_en.md * Update README_en.md * Update README_en.md * Update __init__.py * Replace OMP with std::thread (#3309) * fix bug and codestyle * save change * change code style * fix conflict * change h file * Update tokenizer.cc Co-authored-by: zhoushunjie <[email protected]> Co-authored-by: Zeyu Chen <[email protected]> * update tipc log (#3333) * Remove unused function of Pipelines (#3330) * update CodeGen doc (#3299) * update doc * update doc * update docs Co-authored-by: 骑马小猫 <[email protected]> * fix tipc log (#3337) * [MoE] Fix recompute & communication api (#3338) * update moe recompute. * [few-shot] fix typo and failed links (#3339) Co-authored-by: Zhong Hui <[email protected]> * [New Model]add t5-encoder-model (#3168) * add t5-encoder-model * update t5model * update t5encoder & test modeling * update t5 * update type hinting * update cache type annotation * Update retrieval based classification README.md (#3322) * Update retrieval based classification README.md * Revert predict.py * Update cpu predict script * restore gpu config * Fix TIPC log path (#3347) * Upgrade Neural Search README.md (#3350) * support layoutxlm re dygraph to static (#3325) * support layoutxlm re dygraph to static * fix error * upgrade-modeling-output (#3305) * upgrade-modeling-output * fix codestyle * Compression API supports ELECTRA (#3324) * supports electra * fix typo * [FasterGeneration] MBart supports dy2sta (#3356) * unimo unittests (#3349) * [Benchamrk] Fix fuse_transformer option of TIPC (#3358) * Fix the README description of Pipelines & Neural Search (#3353) * Fix the README description * Update Pipelines README.md * Update Docker README.md * Add more details for ranking model * supports distribute (#3361) * Fix the semantic search example mistakes (#3363) Co-authored-by: Zeyu Chen <[email protected]> * [BugFix] Fix amp usage for evaluation. (#3303) * fix eval of amp usage. * fix * [MoE] Fix distributed wait api (#3365) * Fix gpt example attention mask (#3240) * add hf ds and upgrade example * fix attention mask * update * update attention mask * fix static attention mask * Fix erniegen no model_config_file (#3321) * fix * rm save_pretrained * fix tipc log for benchmark and upate bigru_crf config (#3373) * fix tipc log * fix tipc log and upate bigru_crf config * add t5 encoder model (#3376) * MBART supports freeze multi-lingual model when dy2sta (#3367) * fix dataloader memory overflow * add warning * Update README_en.md (#3375) edit typo Co-authored-by: Zeyu Chen <[email protected]> * Improve CodeGen (#3371) * Add codegen unittests (#3348) * add codegen unittests * fix codegen * update * [BugFix] fix supporting `OrderedDict` bug in paddle.jit module (#3364) * convert keys to `__dict__` * use fields to get keys Co-authored-by: Guo Sheng <[email protected]> * 【Hackathon + GradientCache】 (#1799) * gradient_cache * gradient_cache * gradient_cache * gradient_cache * data * train_for_gradient_cache * add * add * add * 修改 * 修改 * update * update * update * update * Update README_gradient_cache.md * Update README_gradient_cache.md * Update README_gradient_cache.md * feat: modified the code * fix: delete useless code * feat: added requirements.txt * feat: modify readme * feat: modify some code * feat: code style * feat: add function * feat: add licence * feat: add comments * Update README_gradient_cache.md * feat: modify readme * feat: modify readme * fix: copyright * fix: yapf * feat: modify readme * feat: modify readme * feat: delete useless code * feat: add new explain Co-authored-by: 吴高升 <[email protected]> Co-authored-by: 吴高升 <[email protected]> * [TIPC] Add scripts for npu and xpu, test=develop (#3377) * add scripts for xpu and npu * add npu/xpu args * add script for xpu * add npu/xpu args to predict.py * fix codestyle ci bug * add copyright * fix copyright_checker * Add ERNIE-LayoutX (#3183) * Add ernie-layoutx * simplify code * simplify code * support batch input * add word_boxes support * Update docs * update * Update README.md * Udpate README.md * Update README.md * Update README.md * [Dygraph] Support sharding stage2/3+dp in GPT-3 model (#2471) * add sharding+dp * update * code style check Co-authored-by: gongenlei <[email protected]> * complete t5 more output (#3370) * fix gpt N4C32 dp script bug (#3392) * codestyle * Update README.md of neural search (#3391) * Update artist model activateion (#3106) * update * rename * fix gpt ut (#3407) * add qg example * delete useless scripts * delete .sh files in t5 dir * normalize t5 naming * rewrite run_gen.py to train.py and predict.py in unimo-text * Update README_cn.md (#3413) * fix bigru crf offset index error (#3418) * modified according to zeyang's comments * modified according to zeyang's comments * fix bert unittest bug (#3422) * fix bert unittest bug * change token_labels -> sequence_labels * [BugFix]Fix ernie tokenizer unittest (#3423) * fix bert unittest bug * change token_labels -> sequence_labels * update ernie tokenizer max_input_size * update qg example readme * fix pillow deperate warning (#3404) Co-authored-by: gongenlei <[email protected]> * Update taskflow.py (#3424) fix typo * fix bug of debug mode (#3417) * rewrite unimo-text/predict.py to retrain only the prediction function * support paddle serving http deploy for text classification (#3378) * add_http_deploy * [prompt] add doc (#3362) * modified according to zeyang's comments, 20221010 * [few-shot] fix script for multi_class and fix input type for windows (#3426) * Update README_cn.md * adjust the position of the experiment' result * support mlu training (#3431) * support mlu training * [mlu] add mlu config in rnn and ernie-1.0 README. * remove the tcn for the paddlenlp (#3435) * add qg-taskflow * fix code style * Add multi type files index update example for pipelines (#3439) * [MLU] support SQuAD_Bert with mlu device (#3434) * Update FAQ Finance Paddle Serving dependencies (#3430) * Add batch prediction for pipelines (#3432) * Add batch prediction for pipelines * Fix some hardcode problem& Update comments * Support past_key_values argument for Electra (#3411) * unit test pass; fix yapf * change docstring Co-authored-by: 骑马小猫 <[email protected]> Co-authored-by: Guo Sheng <[email protected]> * modified according to zeyang's comments * refine gpt (#3447) * fix some typos in qg-example readme * Fix #3446 (#3457) * update Pillow version * compare version * [NEW Features] feature_extraction and processor support from_pretrained (#3453) * update * add import * Update README.md and optimize DocPrompt postprocess (#3441) * Update README.md * optimize sort * update * Update * Update * Update * Update * Update * Update * update * update * Add english docs and rename ernie_layout * Add english docs and rename ernie_layout * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * Update taskflow.md * update * add symbolic link for ernie_layout * Update README.md Co-authored-by: wj-Mcat <[email protected]> Co-authored-by: yujun <[email protected]> Co-authored-by: 吴高升 <[email protected]> Co-authored-by: limingshu <[email protected]> Co-authored-by: chenxiaozeng <[email protected]> Co-authored-by: tianxin <[email protected]> Co-authored-by: Guo Sheng <[email protected]> Co-authored-by: bruce0210 <[email protected]> Co-authored-by: wuhuachaocoding <[email protected]> Co-authored-by: Zhong Hui <[email protected]> Co-authored-by: wawltor <[email protected]> Co-authored-by: liu zhengxi <[email protected]> Co-authored-by: kztao <[email protected]> Co-authored-by: Jack Zhou <[email protected]> Co-authored-by: paopjian <[email protected]> Co-authored-by: gongenlei <[email protected]> Co-authored-by: lugimzzz <[email protected]> Co-authored-by: Jiaqi Liu <[email protected]> Co-authored-by: WangZhen <[email protected]> Co-authored-by: Thomas Young <[email protected]> Co-authored-by: Zeyu Chen <[email protected]> Co-authored-by: zhengya01 <[email protected]> Co-authored-by: Roc <[email protected]> Co-authored-by: Noel <[email protected]> Co-authored-by: zhoujun <[email protected]> Co-authored-by: Liujie0926 <[email protected]> Co-authored-by: westfish <[email protected]> Co-authored-by: Septilliony <[email protected]> Co-authored-by: Elvis Stuart <[email protected]> Co-authored-by: 吴高升 <[email protected]> Co-authored-by: duanyanhui <[email protected]> Co-authored-by: Haohongxiang <[email protected]> Co-authored-by: Yam <[email protected]> Co-authored-by: sneaxiy <[email protected]> Co-authored-by: alkaid <[email protected]> Co-authored-by: Chenxiao Niu <[email protected]> Co-authored-by: qipengh <[email protected]> Co-authored-by: Sijun He <[email protected]>

w5688414 added 2 commits September 14, 2022 08:13

Upgrade FAQ finance to Milvus 2.1

fb88c75

Update text format for faq

cfccbd7

w5688414 requested review from tianxin1860 and wawltor September 14, 2022 08:27

w5688414 self-assigned this Sep 14, 2022

Update feature_extract.sh

81fa879

tianxin1860 approved these changes Sep 15, 2022

View reviewed changes

Merge branch 'develop' into faq3

173ae10

w5688414 merged commit 0f464e8 into PaddlePaddle:develop Sep 16, 2022

w5688414 mentioned this pull request Oct 13, 2022

PaddleNLP 2.4.1 Release Note Candidate #3448

Closed

w5688414 deleted the faq3 branch April 16, 2024 02:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade FAQ finance to Milvus 2.1 #3267

Upgrade FAQ finance to Milvus 2.1 #3267

w5688414 commented Sep 14, 2022

tianxin1860 left a comment

Upgrade FAQ finance to Milvus 2.1 #3267

Upgrade FAQ finance to Milvus 2.1 #3267

Conversation

w5688414 commented Sep 14, 2022

PR types

PR changes

Description

tianxin1860 left a comment

Choose a reason for hiding this comment